A Framework for Summarization of Multi-topic Web Sites
نویسنده
چکیده
Web site summarization, which identifies the essential content covered in a given Web site, plays an important role in Web information management. However, straightforward summarization of an entire Web site with diverse content may lead to a summary heavily biased to the dominant topics covered in the target Web site. In this paper, we propose a two-stage framework for effective summarization of multi-topic Web sites. The first stage identifies the main topics covered in a Web site and the second stage summarizes each topic separately. In order to identify the different topics covered in a Web site, we perform coupled textand link-based clustering. In text-based clustering, we investigate the impact of document representation and feature selection on the clustering quality. In link-based clustering, we study co-citation and bibliographic coupling. We demonstrate that text-based clustering based on the selection of features with high variance over Web pages is reliable and that outgoing links can be used to improve the clustering quality if a rich set of cross links is available. Each individual cluster computed above is summarized using an extraction-based summarization system, which extracts key phrases and key sentences from source documents to generate a summary. We design and develop a classification approach in the cluster summarization stage. The classifier uses statistical and linguistic features to determine the topical significance of each sentence. Finally, we evaluate the proposed system via a user study. We demonstrate that the proposed clustering summarization approach significantly outperforms the single-topic summarization approach.
منابع مشابه
Topic-based web site summarization
Purpose Summarization of an entire Web site with diverse content may lead to a summary heavily biased towards the site’s dominant topics. This paper presents a novel topic-based framework to address this problem. Design/methodology/approach A two-stage framework is proposed. The first stage identifies the main topics covered in a Web site via clustering and the second stage summarizes each topi...
متن کاملExploiting relevance, coverage, and novelty for query-focused multi-document summarization
Summarization plays an increasingly important role with the exponential document growth on the Web. Specifically, for query-focused summarization, there exist three challenges: (1) how to retrieve query relevant sentences; (2) how to concisely cover the main aspects (i.e., topics) in the document; and (3) how to balance these two requests. Specially for the issue relevance, many traditional sum...
متن کاملAMDS: Sentence Extraction Based Proficient Framework For Multi-Document Summarization
Rapid improvement of electronic documents in World Wide Web has made overload to the users in accessing the information. Therefore, abstracting the primary content from numerous documents related to same topic is highly essential. Summarization of multiple documents helps in valuable decision-making in less time. This paper proposed a framework named Adept Multi-Document Summarization (AMDS) fo...
متن کاملBringing Summarization to End Users: Semantic Assistants for Integrating NLP Web Services and Desktop Clients
We present PathSum, a high-performing hierarchical-topic based singleand multi-document automatic text summarization framework. This approach leverages Bayesian nonparametric methods to model sentences as paths through a tree and create a hierarchy of topics from the input in an unsupervised setting. We describe the generative model used to learn a topic tree based on hierarchical latent Dirich...
متن کاملText Summarization Using Cuckoo Search Optimization Algorithm
Today, with rapid growth of the World Wide Web and creation of Internet sites and online text resources, text summarization issue is highly attended by various researchers. Extractive-based text summarization is an important summarization method which is included of selecting the top representative sentences from the input document. When, we are facing into large data volume documents, the extr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Knowledge Eng. Review
دوره 24 شماره
صفحات -
تاریخ انتشار 2009